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Abstract 


Despite recent technological advances in genomic sciences, our understanding of cancer progression and its 
driving genetic alterations remains incomplete. Here, we introduce TiMEx, a generative probabilistic model 
for detecting patterns of various degrees of mutual exclusivity across genetic alterations, which can indicate 
pathways involved in cancer progression. TiMEx explicitly accounts for the temporal interplay between the 
waiting times to alterations and the observation time. In simulation studies, we show that our model outper¬ 
forms previous methods for detecting mutual exclusivity. On large-scale biological datasets, TiMEx identifies 
gene groups with strong functional biological relevance, while also proposing many new candidates for bio¬ 
logical validation. TiMEx possesses several advantages over previous methods, including a novel generative 
probabilistic model of tumorigenesis, direct estimation of the probability of mutual exclusivity interaction, 
computational efficiency, as well as high sensitivity in detecting gene groups involving low-frequency altera¬ 


tions. The R code implemented TiMEx is available at www.cbg.bsse.ethz.ch/software/TiMEx 


Introduction 


Despite recent technological advances in genomic sciences, our understanding of cancer progression still faces 
fundamental challenges. To this end, new ways of interpreting the increasing amount of generated data are 
devised, aiming at finding biologically relevant patterns. An important example is the separation of genes 
into drivers, which have a selective advantage and significantly contribute to tumor progression, and passen¬ 
gers, which are selectively neutral and can hitchhike along with fitte r clone s. Even if intuitive and routinely 
used, identifying drivers as recurrently altered genes ( Sioblom et ah . 2006h only explains tumorigenesis in a 
fraction of patients. Alternatively, the functional role of drivers can be assessed in the context of groups of 
genes, all possessing the same important function, commonly known as pathways. Once one of the group 
members is altered, the tumor gains a significant selective advantage. The alteration of additional group 
members does not further increase the selective advantage of the tumor, making genotypes with a single 
alteration likely the most frequent. In this case, the group of genes displays a mutually exclusive alteration 
pattern. 

C urrent approaches fo r detecting mutual exclusivity are ei t her d e novo (lYeang et aP. 2008Ding et al 


2008 : Vandin et al.l . 12012 : Leiserson et al. . 2013: Miller et ah . 201ll : ISzczurek and Beerenwintel . 2014 ) or 


based on biological interaction networks (ICiriello et al.l . 120121) . While highly informative, the current biolo¬ 
gical knowledge is incomplete, such that limiting the search space to known biological interactions signific¬ 
antly reduces the detection power. Straightforward pairwise statistical tests assessing whether the number of 
observed double mutants is lower than ejected by chance ha.ve als o been employed, fo llowed by ideritifying 


observed double mutants is lower tnan ejected by cnance na.ve als o been employed, to llowed by iderititying 
groups as maximal cliques ( Yeang et al.l . l2008t ICiriello et al. . 2012 ). The Dendrix tool ( Vandin et al. . 20121) 
performs a Markov Chain Monte Carlo sampling for group structure search and then a permutation test 
for finding sets of genes with both high coverage and high exclusivity. Its l imita tion of finding the single 


main pathway per dataset was addressed by Multidendrix (jLeiserson et al.l . 120131) . a follow-up tool which 
simultaneously identifies multiple driver pathways via an integer linear programming approach. Einally, 


Szczurek and Beerenwinkell ( 2014]) propose muex, a statistical model for mutual exclusivity, where, however, 
the group members are required to have similar alteration frequencies. All existing approaches ignore the 
fact that the mutually exclusive patterns occur over time, during disease progression. 
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Figure 1: Overview of the TiMEx multistep procedure for detecting mutually exclusive groups of alterations in a large dataset. 
First, from a binary alteration matrix consisting of M samples and k genes, the degree of mutual exclusivity fiij and the p- 
value for testing fiij ^ 0 against fiij = 0 are estimated for all gene pairs {i,j). Second, candidate groups are identified as 
maximal cliques of genes sharing a significant minimum degree of mutual exclusivity (satisfying the thresholds Ppair ^ud Pp^ir 
for each edge). Finally, the candidate groups are statistically tested for mutual exclusivity and the degree of mutual exclusivity 
corresponding to each group is estimated and tested for significance. 


Here, we introduce TiMEx, a generative probabilistic model for the de novo detection of mutual exclusiv¬ 
ity patterns of various degrees across carcinogenic alterations. We regard tumorigenesis as a dynamic process, 
and base our model on the temporal interplay between the waiting times to alterations, characteristic for 
every gene and alteration type, and the observation time. Under the assumption of rarity of events over 
short time intervals, TiMEx models the alteration process for each gene as a Poisson process. The waiting 
times to alterations are therefore modeled as exponentially distributed variables with specific rates, which 
correspond to the rates of evolution for each alteration. In our modeling framework, the temporal dynamics 
of each alteration process progresses from the onset of cancer, corresponding to the first genetic alteration 
responsible for the growth of a malignant tumor, up to the observation time, corresponding to the time of 
the tumor biopsy. The observation time is regarded as a system failure time, and is exponentially distributed 
with an unknown rate. 

A perfectly mutually exclusive group is defined as a collection of genes in which, for every tumor sample, 
at most one gene is altered. Conversely, we assume that in a group showing no mutual exclusivity, each 
gene is altered conditionally independent, given the observation time. In a realistic biological setting how¬ 
ever, additional alterations may still provide a small selective advantage to the tumor, rather than none 
at all, which may lead to the fixation of a genotype with more than one alteration, in a group of genes 
otherwise perfectly mutually exclusive. Thus, biologically, groups of genes display a continuous range of 
mutual exclusivity degrees. TiMEx quantifies these degrees exactly, and assesses their significance using 
a likelihood ratio test. Our procedure for efficient search for mutually exclusive patterns in large datasets 
consists of three steps (Figure [T]). We first estimate mutual exclusivity between all possible gene pairs in 
the dataset. Second, we select as candidates the gene groups in which the significance and degree of mutual 
exclusivity between each pair of members are high. Third, the candidate groups are statistically tested 
for mutual exclusivity. In simu lation studies , we sh ow that TiMEx outperf orms the permutation-based 
meth od previously introduced by Vandin et al.l ( 20121 ) and the muex model ( Szczurek and Beerenwinkell . 
20141). Furthermore, we apply our procedure to f our large TCGA studies, two glioblastoma datasets, ovarian 
(jCancer Genome Atlas Research Network . 2011 ) and breast (provisional) cancer datasets. On these data¬ 
sets, we show that TiMEx identihes gene groups with stronger functional biological relevance than the other 
two methods, while also proposing many new candidates for biological validation. TiMEx doesn’t impose 
any temporal assumptions on the set of biological samples it is applied on. These samples are considered 
to be independent. Without requiring any previous biological knowledge, our procedure identifies mutually 
exclusive gene groups of any size, statistically tests and ranks them by their degree of mutual exclusivity. It 
possesses several advantages over previous methods, including the probabilistic modeling of tumorigenesis as 
a dynamic process, the novel and intuitive quantification of the degree of mutual exclusivity as a probability, 
high computational efficiency on large datasets, as well as high sensitivity in detecting low frequently altered 
genes. 
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Methods 


Probabilistic model 

We consider n genes indexed byA^ = {1,2,...,n}, whose alteration statuses are represented by the vector of 
binary random variables X = {Xi,X 2 , ■ ■ ■, Xn), recorded at observation time Tobs Exp(Aobs)- The waiting 
times to alteration of the n genes are represented by the vector of random variables T = (Ti,T 2 ,... ,Tn), 
where Ti ^ Exp(Ai), for all* G N. For a given tumor sample, we refer to an instantiation of X, namely 
{xi,... ,Xn), as a genotype, where Xi G {0,1}, for all i G N. Moreover, for any set of indices K C N with 
cardinality li^l, we denote by gx the genotype for which positions with indices in K are equal to 1 and 
positions with indices in N \ K are equal to 0. The presence of an alteration event in a tumor sample 
signifies both its occurence in one of the tumor cells and its fixation in the measured population, such 
that the alteration is observed at screening. Let V denote M independent observations ... ,X^^'>y 



j denotes the alteration statuses of the n genes in tumor sample j, and 
denotes their corresponding waiting times. The binary variables Xi are 


observed, while Ti and Tobs are hidden. We are interested in inferring the degree of mutual exclusivity among 


the group of n genes. To this end, we compute the likelihood of the data T under the nested null and mutual 
exclusivity models introduced below. As the observations are independent, the likelihood of the data under 


any model 9 is L {6 \V) = P . 
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Figure 2: Graphical representations of (A) null and (B) mutual exclusivity models for two genes, i and j. The observed 
variables are shaded in gray, while the hidden ones are not. The binary random variables Xi and Xj denote the alteration 
statuses of the two genes, Ti and Tj are their waiting times, and Tobs is the observation time, exponentially distributed with 


corresponding parameters A. In the null model, Xi and Xj are conditionally independent given the observation time Tobs? while 


in the mutual exclusivity model, they also depend on each other via the parameter fiij, which represents their degree of mutual 
exclusivity. 


Null Model 


The null model (Figure [5]4), parameterized by 0nuU = (Ai, A 2 ,..., A„, Aobs), assumes that alterations in the 
n genes are conditionally independent from each other, given the observation time Tobs- The condition for 
observing an alteration in a gene i is that its corresponding waiting time is shorter than or equal to the 
observation time: if Ti < Tobsj then Aj = 1, otherwise Xi = 0. Hence, the dependency between the set of 
binary variables Xi is deterministic and given by the common observation time Tobs- 

The genotype is observed if the observation time is shorter than the waiting times of all alterations. 


Therefore, Tobs is the minimum of n + 1 competing exponentials ( Blumenfeld . 2009ll . and 



( 1 ) 
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Any genotype qk with AT ^ 0 is observed if the waiting times of alterations present in the sample, 
Ti, for all i G AT, are shorter than the observation time, and the waiting times of alterations not present in 
the sample, Ti, for alH S iV \ AT, are longer than the the observation time. 


P {gx I ^Nuii) = P 


(^maxT, <Tobs< min T, 
\i^K iGN\K 


( 2 ) 


The probability that the observation time is shorter than the waiting times of unobserved alterations is not 
influenced by the specific order between the waiting times of those alterations. Therefore, P {gK \ ^*Nu 1 i) fur¬ 
ther equals the sum of the probabilities of all possible specific orders of waiting times of observed alterations. 
Let Sk = {(*(t(i);* cr( 2 )) ■ • ■ )*(T(|if|)) I ^ AT for all j and cr G represent the set of all permutations 

of indices in AT, where Sfc is the symmetric group of degree k. By recursively using the expression of the 
probability of the minimum of competing exponentials, the probability of observing the genotype gx is 


P (gx I ^'nuu) = ^ P (pn < Ti 2 ■ ■ ■ < T^k\ ^ ^obs < . min TA 

/. . \ r. ^ iGN\K J 

[l\A2,---A\K\)^OK 

\ A 

_ ^obs TT 

Aobs + Eiew\if Ai K + Aobs + J2^eN\x 


( 3 ) 


As the observations T) contain no temporal information, the model 0 nu 11 is unidentifiable. 

Proposition 1. The null model O^uii = (Ai, A 2 ,..., A„, Aobs) is identifiable only up to Xobs- 

For the proof, see Supplementary Methods. After setting Aobs = 1 (without loss of generality), equivalent 
to scaling the waiting time rates by Aobs, the reparametrized null model ^nuII = (Ai, A 2 ,..., A„) becomes 
identihable. 


Mutual Exclusivity Model 

In the mutual exclusivity model (Figure [2j3), the n genes are assumed to contribute to the same biological 
function, such that, up to various degrees of mutual exclusivity, only one member is necessary and sufficient 
to be altered for cancer to progress. An increasing mutual exclusivity interaction in the group directly 
leads to an increasing fixation probability of a single alteration, corresponding to the gene with the shortest 
waiting time. The degree of mutual exclusivity of a group of n genes with indices in N, denoted by fix, is 
the probability that the group is perfectly mutually exclusive, px can also be interpreted as the fractional 
increase in the fixation probability of the genotypes with a single alteration, when more than one gene in 
the group were altered before observation time, but, due to the mutual exclusivity interaction between the 
genes in the group, only the one with the shortest waiting time fixates. The fixation of alterations of further 
genes is suppressed with probability px. Consequently, 1 — px represents the probability of deviating from 
perfect mutual exclusivity, and for px —>■ 0 , the mutual exclusivity model is reduced to the null model. 

The mutual exclusivity model is parametrized by 0 me = (Ai, A 2 ,..., An, Aobs, Aw)- probability of 
observing the genotype <70 is the same as in the null model, as the lack of fixated alterations is uninformative 
for detecting mutual exclusivity. 


P (30 I 0me) = P 


Fobs < min F 
iGN 


Aobs 


Aobs + X)ieAr Ai 


( 4 ) 


Any genotype gx with a single alteration, i.e. with |Ar| = 1, can be observed either because F is a mutually 
exclusive group, or because, by chance, the process of tumorigenessis itself has been observed at the specific 
point in time when only the alteration with the shortest waiting time in K had fixated. Hence, the probability 
of observing gx is the weighted sum of the marginal probability that Tx is simply the shortest waiting time 
among all waiting times and the probability that the observed alteration pattern happened in the absence of 
mutual exclusivity interaction between the genes. The first term represents the probability computed under 
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perfect mutual exclusivity and is weighted by /tat, while the second represents the probability computed 
under the null model, and is weighted by 1 — 


' [qk \0me) = ^J-NP (tk < min (T^, Tobs)) + (1 - MAf) ^ < T’obs < min 

\ ieN\K j \ ieN\K J 


^k\keK ^ohs + K ~ ^k;kGK) 

•^obs ~t“ -^obs “t“ ^j^A/ ^k;kGK 


(5) 


Furthermore, observing any genotype qk with \K\ > 2, i.e., any genotype with more than one alteration, is 
considered a deviation from perfect mutual exclusivity. The probability of observing each extra alteration 
equals the probability that its waiting time is shorter than the observation time, weighted by 1 — the 
probability of violating perfect mutual exclusivity. 


P {9k I ^'me) = (1 - 9'n) P fmaxTi < < min 

\^i£K i^N\K J 


= (1 - MAf) X! 


■'obs 


1^1 

n 


Aobs + J2^eN\K jJi J2\=\ K + ^Obs + EiGAr\if 


( 6 ) 


Proposition 2. The mutual exclusivity model 9me = (Ai, A 2 ,..., A„, 9-NT^obs) is identifiable only up to 
A 065 ■ 

For the proof, see Supplementary Methods. Similarly to the null model, after setting Aobs = 1 (without loss 
of generality), the reparametrized mutual exclusivity model 0 me = (Ai, A 2 ,..., Xn, Tn) becomes identifiable. 


Parameter estimation and testing 


The maximum likelihood estimates of all parameters are obtained by setting to zero the corresponding first 
derivative o f the o bserved log likelihood, numerically approximated using the gradient projection method 
I 995 I I (Figures SI and S2). An exception is the case n = 2, which allows for an analytical 


(Bvrd et al 


solution for 6 *nuU (the estimates are given in Supplementary Methods). 


Proposition 3. If n = 2, then there exists a closed-form solution for the maximum likelihood estimates of 
Al and X 2 under the null model 9nuII- 


To test for any degree of mutual exclusivity interaction among the n genes, we are testing the alternative hypo¬ 
thesis 0 versus the null hypothesis /tat = 0. The logarithm of the ratio of the tw o likelihoods computed 
for th e maximum likelihood estimates is distributed with one degree of freedom ( Nevman and PearsonL 
I 1992 II . The likelihood ratio test statistic is well behaved, as under the null hypothesis, the p-values are 
uniformly distributed (Figure S3). 


Overall procedure and computational complexity 

Our procedure consists of three steps. Given a large dataset of k genes, we first test all ( 2 ) pairs for 
mutual exclusivity, estimating 0 nuU and 0 me for each pair. The computational complexity of this step is 
0{k‘^). Second, we construct an undirected graph in which genes are vertices and an edge is drawn between 
any pair {i,j) if, for chosen thresholds Ppair and /Tpair, the p-value pij < Ppain and the degree of mutual 
exclusivity piij > ^pair- The thresholds are chosen based on the sensitivity and specificity levels to which 
they correspond, as assessed in simulated data. Further, we produce group candidates by listing all maximal 
cliques in the constructed graph. To this end, we use the Bon-Kerbosch recursive backtracking algorithm 
( Bron and Kerbosch . 19731) . The upper bo und on the running time of the Bon-Kerbosch algorithm is 0(3^^^). 


However, in practice, it is highly efficient (iCazals and Karande . 


. Finally, we test the candidate groups 


for mutual exclusivity, and select the ones for which the Bonferroni corrected p-value is lower than a chosen 
cutoff. Due to the cubic complexity of matrix inversion (in standard Gauss-Jordan elimination) employed by 


19951 ). the complexity of the last step has an upper bound 


the numerical optimization routine (|Bvrd et al. 
of 0{sqq^), where q is the maximal identified clique size, and Sq the number of such cliques of this size. 
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Figur e 3 : Mean p-value (over 100 simulation runs) for TiMEx, Fisher’s exact test, and the permutation test in IVandin et al.l 
ll2Q12l 'l. for different sample sizes M, pairwise degrees of mutual exclusivity waiting time rates determining the 

marginal frequencies of the two genes, Ai and A2. TiMEx is highly sensitive in detecting mutual exclusivity, and outperforms 
the other two tests, which only start detecting mutual exclusivity for M{1,2} ^ The detection capacity increases with 

increasing values of /i{i,2}5 '^2- 


Results 


Simulations 

We assessed the behavior and performance of TiMEx on simulated data, by varying the waiting time rates of 
the genes, the degrees of mutual exclusivity of the group, and the sample sizes. Specifically, the values of the 
sample size M were 300, which is similar to the size of the ovarian cancer dataset, 1000, similar to the size 
of the breast cancer dataset, and 4000, which is a realistic estimate for the size of genomic datasets in the 
near future. The degrees of mutual exclusivity used for simulations were /r S {0, 0.2, 0.4, 0.5, 0.6,0.8,1}. We 
compared the ability of mutual exclusivit y detection of TiMEx , for both pairs and groups, with a previously 
introduced permutation-based method in IVandin et al. ( 2012 1 (ran with 1000 permutations). Eor the tests 
on pairs, TiMEx was further compared with one-sided Fisher’s exact test for contingency tables, testing 
whether the number of double mutants is significantly lower than exp ected under independence. For th e 
test on larger groups, TiMEx was additionally compared with muex ( Szczurek and Beerenwinkell l2014h . 
a previously introduced statistical model for detecting mtuually exclusive groups. In a power analysis, 
we investigated how the sensitivity and specificity of our procedure are influenced by the thresholds on 
significance and mutual exclusivity degree, Ppair and /Tpair. 


Test performance for pairs and groups 

For simulating mutually exclusive gene pairs, we used Ai, A 2 G {0.02, 0.1, 0.5,1,3}, corresponding to marginal 
frequencies of the two genes ranging from 2% to 75% in the null model and 0.5% to 75% in the mutual 
exclusivity model with Ai{i, 2 } = 1 (Tables SI and S2). We performed 100 simulation runs, detected pairwise 
mutual exclusivity with the three tests, and recorded the mean p-value (Figure [3|). In the case where 
p,{i 2 } = 0, corresponding to lack of mutual exclusivity, all three tests do not reject the null hypothesis, with 
a p-value close to I, for all tested combinations of frequencies and sample sizes. TiMEx is the only test 
that starts detecting mutual exclusivity from the first non-zero value of /r{i, 2 } in ths chosen simulation set, 
however with reduced performance for small sample size and small frequencies of both genes. The detection 
capacity increases with increasing values of Af, Ai, and A 2 . For example, for a chosen significance 

level of 0.05 and a sample size of M = 4000, TiMEx detects the gene pairs as being mutually exclusive for 
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Figure 4: Summa ry pvalue ( over 100 simulation runs and 10 s imulated groups of size 5) of TiMEx, the permutation test in 
IVandin et al.l l|2012h and muex llSzczurek and Beerenwinkell . l2(H3l . for different sample sizes M and degrees of mutual exclusivity 
/rjv- TiMEx is highly sensitive in detecting mutual exclusivity, and outperforms both the other two methods. The permutation 
test only detects mutual exclusivity for /rjy ^ 0-8 and outperforms muex, which only detects pure mutualy exlusivity {fipf = 1). 
The detection ability of TiMEx improves with increasing values of M and /rjv- 


any value of /t{i, 2 } ^ 0-4 and for any Ai, A 2 > 0.1. For higher marginal frequencies such as, for example, 
corresponding to Ai,A 2 > 0.5, we can detect mutual exclusivity of degree ftji 2 } > 0-4 for sample sizes as 
low as M = 300. By contrast, Fisher’s exact test and the permutation test in IVandin et al.l (|2012r) . while 
performing highly similarly to each other, detect no mutual exclusivity for < 0.5. Moreover, for 

M{i, 2 } > 0.5, their detection ability is much reduced compared to TiMEx. The null model used in Fisher’s 
exact test is a classical independence model, while the waiting times in our null model are not statistically 
independent, even with fixed rate of the observation time Aobs- 

For simulating mutually exclusive groups, we fixed the group size to 5 and produced 10 different groups 
by uniformly sampling waiting time rates with values between 0.01 and 1, which corresponds to an expected 
alteration frequency of 14% (Table S3). We performed 100 simulation runs, detected mutual exclusivity 
with TiMEx, the permutation test, and the muex model, and summarized the p-value over both different 
simulated groups and simulation runs (EigureS]). Similarly to the case of pairs, the detection ability of 
TiMEx increases with increasing sample size and degree of mutual exclusivity. Eor a significance level of 
0.05, we detect mutual exclusivity for almost all tested sample sizes and mutual exclusivity degrees, with 
the exception of small sample size M = 300 and low degree of mutual exclusivity = 0.2. Eor the highest 
sample size, M = 4000, TiMEx is very sensitive in detecting mutual exclusivity for any tested positive degree, 
with a mean p-value < 10“"^. By contrast, the permutation test only starts detecting mutual exclusivity for 
> 0.8, however outperforming muex, which only detects pure mutual exclusivity (/iAr = 1). On data 
simulated using /r^r = 0, all three tests do not reject the null hypothesis with mean p-value > 0.6. 

In addition to assessing the detection ability of TiMEx and other methods on data simulated from TiMEx, 
we conducted simulations on datasets generated more generally. We generated groups of n = 5 mutually 
exclusive genes by varying the sample size as before, the coverage, i.e. the percentage of patients which 
have at least one gene altered, among {0.2,0.4,0.6, 0.8}, and the probability of passenger alterations among 
{0.001,0.01,0.1}. Depending on the coverage, before adding noise, at most one gene was altered in each 
patient, which rendered the group perfectly mutually exclusive. Passenger mutations were further added 
to each patient with the chosen probability. On all datasets, TiMEx outperforms the permutation test and 
muex, and always records lower p-values (ranking not shown for p-values lower than 10“^°) (Figure S4). 
All three methods perform better with increasing sample size, increasing coverage, and decreasing passenger 
probability, and, for most of the tested values, they significantly detect the group as mutually exclusive. 
For TiMEx, we also estimated the degree of mutual exclusivity corresponding to the generated groups 
(Figure S5). For the very low passenger probability of 0.001, the inferred degree of mutual exclusivity is 1, 
as the expected number of passenger mutations per dataset is very low, especially for small sample sizes. 
The lowest inferred degree of mutual exclusivity is 0.7, corresponding to small coverage and small sample 
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size. The estimated fiN increases with increasing coverage and decreasing passenger probability, and the 
estimation improves with increasing sample size. 

Power analysis 

For assessing the true and false positive rates of our procedure, we constructed 100 datasets consisting 
of two groups: a group of size 3 simulated from the mutually exclusive model, and a group of size 9 
simulated from the conditionally independent model, with A values sampled uniformly between 0.01 and 1 
(Table S3). We tested all pairs with TiMEx, detected maximal cliques as candidates, and evaluated them 
with TiMEx. We considered a detected group to be mutually exclusive if its Bonferroni corrected p-value 
was lower than 0.1. We computed the true positive rate by counting a single time, among the detected 
mutually exclusive groups of size at least 3, all edges only connecting two genes part of the true mutually 
exclusive group, and normalizing by the number of all possible such edges. Similarly, we computed the false 
positive rate by counting a single time, among the detected mutually exclusive groups of size at least 3, 
all edges not connecting two genes part of the true mutually exclusive group, and normalizing accordingly. 
TiMEx performs generally very well in reconstructing the implanted mutually exclusive group (Figure S7). 
The highest effect in increasing the true positive rate and decreasing the false negative rate was given by 
increasing the mutual exclusivity degree of the simulated group. For values of the threshold /ipair >0.5, the 
false positive rate was often set to 0, and most of the times reduced by at least 75% as compared to the case 
when yitpair = 0.2. However, for small degrees of mutual exclusivity of the simulated group, the true positive 
rate was often reduced simultaneously with the false positive rate. 

Additionally, on the same simulated datasets, we analyzed how often the true mutually exclusive group is 
also the top ranked group by corrected p-value (Figure S8). The threshold /Xpair largely impacts the detection 
performance, while the impact of Ppair is neligible. For high degrees of mutual exclusivity of the true group, 

> 0.8, the real group is either top ranked, or a strict subset of the top ranked one, depending on the 
value chosen for /Tpair. Optimal performance is achieved for yitpair = 0.5, corresponding to a percentage of 
between 50% and 100% of datasets for which the true group is top ranked. Moreover, if the true group is 
perfectly mutually exclusive {p,N = 1), it is top ranked in more than 90% of the datasets for medium values 
of /ipair, for any value of Ppair, and for medium sample sizes. For lower values of /ipozr, the true group is 
a strict subset of the top ranked group. For low degrees of mutual exclusivity, either no significant groups 
are identified, or the top ranked group is not the real group. The detection power improves with increasing 
sample size and increasing degree of mutual exclusivity /i^v 


Biological datasets 

We ran our procedure on four bi ological datasets: t he two glioblastoma d atasets preprocessed by muex 
( Szczurek and Beerenwinkel . 2014ll and Multidendrix ( Leiserson et ah . 2013 b and two datasets downloaded 
from TCGA and preprocessed as explained in Section S2.3: breast cancer and ovarian cancer. Our main 
interest was detecting gene groups with average or high degree of mutual exclusivity and minimizing the false 
positive rate, while maintaining the true positive rate at a high level. Therefore, based on the sensitivity 
and specificity estimates in simulated data (Figures S7 and S8), we set //pair = 0.5 and Ppair = 0.01 for the 
four datasets. A detected group was considered significantly mutually exclusive if its Bonferroni-corrected 
p-value (q-value) was less than 0.1 (Figure S9). In order to test the stability of the identified groups, we 
subsampled the set of patients at different frequencies: 30%, 50%, and 80%, and repeated the procedure 
100 times, reporting how often each group is still identified as mutually exclusive (Tables S4-S15). Among 
the identified groups of any size, we further computed the most stable subgroups. For mutually exclusive 
groups with high enough alteration frequencies, higher stability indicates stronger mutual exclusivity support 
in the data. For each group size, w e tested the first 10 groups ranked by q-value for pathw ay enrichment 
with WebGesalt ( Zhang et al.l . 12005 ) on the Pathways Gommons dataset ( Gerami et ahlboilll . and reported 
all significantly enriched pahtways for a BH-corrected p-value threshold of 0.01. On all four datasets, we 
compared our results with two other methods: Multidendix, an algorithm based on the permutation test 
we used for comparison on simulated data, and muex (Tables S31-S36). Section SI discusses the mutually 
exlcusive groups identified in the two glioblastoma datasets. 


















Figure 5: The alteration frequencies of selected mutually exclusive groups, as identified by our procedure. The horizontal axis 
displays the members of each group, together with their relative frequency in the dataset, as well as their alteration type {Mut 
for mutation, and CNA for copy number aberration). A black line is drawn whenever an alteration is present in a sample. (A): 
the group consisting of the deletion of PTEN, the amplification of CDK4, and the point mutations EGFR and NFl (q-value 
3e-10) was the most stable among the groups of largest size ( 89% recovery at subsa mpling 80% of the patients) identified by 
TiMEx in the glioblastoma dataset used by Multidendrix in iLeiserson et S] (l2013h . (B): the group consisting of the point 
mutations of CDHl, MAP3K1, GATA3, and the copy number aberrations of CDKNIB, MIENl (q-value le-23) was the most 
significant group of largest size identified by TiMEx in the breast cancer dataset. (C): the group consisting of the point mutation 
of BRCA2 and the copy number aberrations of RBI and CCNEl (q-value 5e-09) was the most significant group of largest size 
identified by TiMEx in the ovarian cancer dataset. 


Mutual Exclusivity in Breast Cancer 


In the breast cancer dataset (Figure[5]), we found 63 groups of size two, 416 groups of size three, 96 groups 
of size four, and 10 groups of size five. Since all the 10 largest groups contained one gene with frequency 
less than 10%, these groups were highly unstable to subsampling, even if they corresponded to functionally 
related collections of genes (Table S13). The first three groups with the lowest q-value consisted of the point 
mutations of the tumor suppressors CDHl, GATA3, MAP3K1, the copy number aberration of CDKNIB, 
which belong to pathways including PI(3)K, mTOR, PDGF receptor signaling network, or EGF receptor 
(ErbBl) signaling, and the copy number aberration of one of the following genes: MIENl, PPPIRIB, or 
ERBB2. MIENl is an oncogenic protein, whose overexpression fun ctionally enhance s migration and invasion 
of tumor cells via modulating the activity of the P1(3)K pathway ( Hsu et all . l2012ll . providing evidence for 
the functional relation between these genes. Moreover, the PPP1R1B-STARD3 chimeri c fusion transcrip t 
was shown to activate the PI(3)K/AKT signaling pathway and promote tumorigenesis ( Yun et Tl l2013[l . 
while ERBB2 is an oncogene that also belongs to the PI(3)K and mTOR pathways. The next two mutually 
exclusive groups of size five included the same three point mutations, the copy number aberration of MIENl, 
and the copy number aberrations of either B4-GALNT3, which has no known functional role in breast cancer, 
or GRB7, which is part of the Common group of pathways. The first groups of size four with lowest q-value 
consisted of the point mutations of CDHl, MAP3K1, TP53, and GATA3, and was entirely mapped to the 
Common group of pathways, as well as to the CDG42 signaling events pathway (Table S12). The second 
and the third group included, instead of the GATA3 point mutation, the copy number aberration of either 
TUBDl or INTS4. Even though strong evidence of association for these two genes and the group of three 
point mutations exists in the data, TUBDl and INTS4 have no known functional role in cancer. The 
subgroups with highest subsampling stability (Tables S22-S24) consisted of genes with known functional 
involvement in cancer, such as GATA3, PIK3CA, or PTEN. 

We separately ran our procedure on the subset consisting of 507 samples annotated as distinct breast 
cancer subtypes (Tables S27-S30). Some of the top ranked mutually exclusive relations identified based 
on the entire dataset were also identified based on the subsets of data belonging to Her2, LuminalA, and 
LuminalB subtypes. None of the alterations identified in the top ranking groups were specitic to the Basal 
subtype (Table S27). For example, the connections between one of the point mutations of PIK3CA or 
CDHl, or the copy number aberration of PTEN, and the copy number aberrations of one of ERBB2, GBR7, 
MIENl, PNMT, or PPPIRIB were also mutually exclusive in the Her2 subtype (Table S28). Similarly, 
the mutually exclusive group consisting of the point mutations MAP3K1, GATA3, and TP53 was identified 
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in the LuminalA subtype (Table S29), while the group including the point mutations PIK3CA, TP53, and 
GATA3 was LuminalB subtype-specific (Table S30). Also, TUBDl, a gene part of mutliple groups, was 
mutually exclusive with the point mutation of MAP3K1 in LuminalA, and with the point mutations of 

PIK3CA and TP53 in LuminalB. _ _ 

We ran Multidendrix on the breast cancer dataset, using a = 2.5 (as suggested in lLeiserson et al. ( 20131) 1. 
t = 4, and a range of values (Table S34). Multidendrix identified with highest weight the core group 
including the point mutations of TP53, GATA3, and MAP3K1, however in the same group as the point 
mutations of GTGF and PLXNB2, which are not part of any of the known functional pathways. On the 
contrary, TiMEx identified these three point mutations in a common module with the point mutation of 
GDHl. Similarly, the next two modules ordered by weight only contained three genes in known functional 
pathways, as assessed by WebGestalt, on the Pathway Commons database (data not shown). The fourth 
module identified by Multidendrix contained no signifincat pathways, muex did not scale to the size of 
dataset, and none of the top 30 groups of any size identified by TiMEx were found significantly mutually 
exclusive by muex’s statistical test. 


Mutual Exclusivity in Ovarian Cancer 

In the ovarian cancer dataset (Figure [5]), we identified 24 mutually exclusive groups of size two and 24 
groups of size three. The top ranked group of size three (Table S15) included three genes part of the 
FOXMl transcription factor network, an d involved in cell cycle regulation, recently show n to play a major 
role in the progression of ovarian cancer (ICancer Genome Atlas Research Network) . 120111) : the copy number 
aberrations of the tumor suppressor gene RBI and the oncogene GGNEl, and the point mutation of the 
tumor suppressor gene BRGA2. The subgroup consisting of RBI and GGNEl was also the most stable to 
subsampling (Tables S26). Among the top five groups of size 3, the one which was most stable to subsampling 
included core members of the ATM pathway: the point mutations of BRGAl and BRGA2, and the copy 
number aberration of GGNEl. These two modules have also been p reviously iden t ified a s mutually exclusive 
by MEMo, an algorithm for detecting mutually exclusive groups ( Ciriello et ah . 2012f) . The following top 
scoring groups of size three included the copy number aberrations of MYG and GGNEl, two members of cell 
cycle regulation pathways involved in the Gl/S phase transition, also identified by MEMo, together with the 
copy number aberration of one gene with yet unknown functional role in ovarian cancer: WNKl, NINJ2, or 
B4GALNT3 (also identified in breast cancer). The top ranked mutually exclusive pair, which was also the 
most stable (identified 54% of the times when subsampling 80% of the patients) included KRAS and TP53 
point mutations, which are part of the p75 NTR receptor-mediated signalling pathway (Table S14). The 
second mutually exclusive pair included the point mutations of TP53 and RBI, both part of the TGFBR 
and p53 pathways. 

We ran Multidendrix on the ovarian cancer dataset, using t = 4 and a range of A:niax values (Table S35). 
The groups identified by TiMEx and Multidendrix showed a high overlap. For example, the top ranking 
groups identified by TiMEx, i.e. the pair including the point mutations of TP53 and KRAS, and the group 
including the copy number aberrations of RBI and GGNEl, together with the point mutation of BRGA2, 
were also identified by Multidendrix. Moreover, subsets of most of the group members that Multidendrix 
identified for e.g. fcmax = 5 were identified by TiMEx as groups of size three, such as the point mutations 
of BRGAl and BRGA2 and the copy number aberration of EPHX3. Even though muex did not scale to 
exhaustively analyze the dataset for groups, 14 of the pairs identified by TiMEx and 4 of the groups of 
size three were found to be significant by muex (Table S36). Almost all the pairs included either the point 
mutation of BRGA2 or the copy number aberration of NFl, while the larger groups included genes mapping 
to relevant pathways, among which many had also been identified by Multidendrix. The reason why these 
groups are also found to be mutually exclusive by muex is the fact that the alteration frequencies of their 
members are balanced. 


Discussion 

We have introduced TiMEx, a probabilistic generative model for detecting mutual exclusive patterns of 
various degrees across carcinogenic alterations, and an efficient multistep procedure for identifying all mu¬ 
tually exclusive groups in large datasets. TiMEx is the first method that describes the mutual exclusivity 
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property as a consequence of a dynamic process in time. Unlike previous de novo approaches, TiMEx infers 
functional relations between genes based on an underlying temporal representation of the process of gene 
alteration in tumorigenesis. Moreover, TiMEx is a probabilistic generative model, providing a natural way 
of rigorously quantifying the degree and significance of mutual exclusivity of a group of genes. Furthermore, 
to the best of our knowledge, TiMEx is the first method inferring a continuous range of mutual exclusivity 
degrees. Biologically, the small, but observable, increase in tumor fitness due to multiple alterations in a 
group of functionally related genes su pports the hypothesi s that mutual exclusivity occurs at various degrees. 


as opposed to a binary classification (|Ciriello et al.l . 120121) . Unlike most other approaches, TiMEx does not 


explicitly impose constraints on frequencies of alterations, in order to identify them as mutually exclusive. 
Our procedure detects both high frequent and very low frequent alterations, only based on the temporal 
relation between them. Finally, it identifies all mutually exclusive gene groups of various, not pre-defined 
sizes, and performs highly efficiently on large datasets. 

TiMEx is however still a simplified representation of carcinogenesis. Given a particular order between the 
waiting times of the genes and the observation time, the probability of violating mutual exclusivity, 1 —/rjv, is 
independent of how many, or which alterations are in a group. One natural extension of TiMEx would be to 
consider an incremental penalty for additional point alterations violating perfect mutual exclusivity, hence 
increasing the probability of being in a non mutually exclusive state with increasing number of violating 
alterations. Additionally, even if highly efficient, the search for mutually exclusive gene groups is heuristic, 
and depends on the thresholds Ppair and /Tpair. With overly stringent thresholds, too few candidates would 
be proposed, while using overly permisive thresholds would lead to selecting as candidates a vast number 
of subsets, making the procedure intractable. To address this, we propose setting the thresholds following 
the desired sensitivity-specificity tradeoff as assessed in simulations. Moreover, the functional role in tum¬ 
origenesis that specific genes might have can be analyzed in higher detail by simply including different point 
mutations of the same gene as separate alterations. 

The exponential distribution, used for mo deling the waiting times to alteratio ns and the observation time, 
is a typical choice to describe waiting times ( Gerstung and Beerenwinkel . 20ldl . both due to its generality 
and to its mathematical convenience. While the exponential distribution is the simplest model for system 
failure time, other families of distributions for modeling the observation time can be readily integrated into 
our mathematical framework, with nevertheless the cost of more involved mathematical formulas. For ex¬ 
ample, using the Weilbull distribution provides a supporting assumption in modeling cancer progression due 
to the fact that the instantaneous probability of occurrence of an event changes with time. However, the 
superiority of such choices would need to be evaluated in future applications. Another extension of TiMEx 
is renouncing to the independence assumption at the level of observations, and applying our procedure to 
large-scale time series data of tumor progression. Once this type of data becomes available, TiMEx will 
facilitate a more detailed understanding of pathways involved in tumor progression. 

In simulation studies, TiMEx outperforms previous methods for detecting mutual exclusive groups, show¬ 
ing high sensitivity even at low degrees of mutual exclusivity and scaling very well to sample sizes of several 
thousands tumors, which is expected to be soon reached by cancer genome sequencing studies. On biological 
datasets, most of the top ranked mutually exclusive groups identified by TiMEx have stronger functional 
biological relevance than the groups identified by previous methods. In conclusion, results on both simu¬ 
lated and biological data clearly indicate that TiMEx is not only theoretically justified by its biological and 
probabilistic foundation in describing tumorigenesis as a generative process of mutually exclusive alteration 
patterns, but is also efficiently and fruitfully applicable in practice. 
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